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Intrapatient HIV-1 evolution is dominated by selection on the protein level in the arms race with the adaptive 
immune system. When cytotoxic CD8 + T-cells or neutralizing antibodies target a new epitope, the virus often 
escapes via nonsynonymous mutations that impair recognition. Synonymous mutations do not affect this inter- 
play and are often assumed to be neutral. We analyze longitudinal intrapatient data from the C2-V5 part of the 
envelope gene (env) and observe that synonymous derived alleles rarely fix even though they often reach high 
frequencies in the viral population. We find that synonymous mutations that disrupt base pairs in RNA stems 
flanking the variable loops of gpl20 are more likely to be lost than other synonymous changes, hinting at a direct 
fitness effect of these stem-loop structures in the HIV- 1 RNA. Computational modeling indicates that these syn- 
onymous mutations have a (Malthusian) selection coefficient of the order of —0.002 and that they are brought up 
to high frequency by hitchhiking on neighboring beneficial nonsynonymous alleles. The patterns of fixation of 
nonsynonymous mutations estimated from the longitudinal data and comparisons with computer models suggest 
that escape mutations in C2-V5 are only transiently beneficial, either because the immune system is catching up 
or because of competition between equivalent escapes. 



I. INTRODUCTION 

HIV-1 evolves rapidly within a single host during the course 
of the infection. This evolution is driven by strong selec- 
tion imposed by the host immune system via cytotoxic CD8 + 
T-cells (CTLs) and neutralizing antibodies (nAbs) (Rambaut 
et al, 2004) and facilitated by the high mutation rate (Abram 
et al, 2010; Mansky and Temin, 1995). When the host devel- 
ops a CTL or nAb response against a particular HIV-1 epitope, 
mutations in the viral genome that reduce or prevent recogni- 
tion of the epitope frequently emerge. Escape mutations in 
epitopes targeted by CTLs typically evolve during early in- 
fection and spread rapidly through the population (McMichael 
et al, 2009). During chronic infection, the most rapidly evolv- 
ing parts of the HIV-1 genome are the variable loops V1-V5 
in the envelope protein gpl20, which change to avoid recog- 
nition by nAbs. Escape mutations in env, the gene encoding 
gpl20, spread through the population within a few months. 
Consistent with this time scale, it is found that serum from a 
particular time typically neutralizes virus extracted more than 
3-6 months earlier but not contemporary virus (Richman et al, 
2003). 

Escape mutations are selected because they change the 
amino acid sequence of viral proteins in a way that reduces an- 
tibody binding or epitope presentation. Conversely, synony- 
mous mutations are commonly used as approximately neutral 
markers in studies of viral evolution. Neutral markers are very 
useful since their dynamics can be compared to that of pu- 
tatively functional sites to detect purifying or directional se- 
lection (Bhatt et al, 2011; Chen et al, 2004; Hurst, 2002). 
In addition to maintaining protein function and avoiding the 
adaptive immune recognition, however, the HIV-1 genome 
has to ensure efficient processing and translation, nuclear ex- 
port, and packaging into the viral capsid: all these processes 
operate at the RNA level and are sensitive to synonymous 
changes since these processes often depend on RNA folding. 
For example, the HIV-1 rev response element (RRE) in env 



enhances nuclear export of full length or partially spliced vi- 
ral transcripts via a complex stem-loop RNA structure (Fer- 
nandes et al., 2012). Another well studied case is the interac- 
tion between viral reverse transcriptase, viral ssRNA, and the 
host tRNA Lys3 : the latter is required for priming reverse tran- 
scription (RT) and is bound by a pseudoknotted RNA structure 
in the viral 5' untranslated region (Barat et al, 1991; Paillart 
efaZ.,2002). 

Even in the absence of important RNA structures, syn- 
onymous codons do not evolve completely neutrally. Some 
codons are favored over others in many species (Plotkin and 
Kudla, 201 1). Recent studies have shown that genetically en- 
gineered HIV-1 strains with altered codon usage can in some 
cases produce more viral protein, but in general replicate 
less efficiently (Keating et al, 2009; Li et al, 2012; Ngum- 
bela et al, 2008). Codon deoptimization has been suggested 
as an attenuation strategy for polio and influenza (Coleman 
et al, 2008; Mueller et al, 2010). Purifying selection beyond 
the protein sequence is therefore expected (Forsdyke, 1995; 
Snoeck et al, 201 1), and it has been shown that rates of evolu- 
tion at synonymous sites vary along the HIV-1 genome (May- 
rose et al, 2007). Positive selection through the host adaptive 
immune system, however, is restricted to changes in the amino 
acid sequence. 

In this paper, we characterize the dynamics of synonymous 
mutations in env and show that a substantial fraction of these 
mutations is deleterious. We argue that synonymous muta- 
tions reach high frequencies via genetic hitchhiking due to 
limited recombination in HIV-1 populations (Batorsky et al, 
2011; Neher and Leitner, 2010). We then compare our ob- 
servations to computational models of HIV-1 evolution and 
derive estimates for the effect of synonymous mutations on 
fitness. Extending the analysis of fixation probabilities to the 
nonsynonymous mutations, we show that time-dependent se- 
lection or strong competition of escape mutations inside the 
same epitope are necessary to explain the observed patterns 
of fixation and loss. 
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II. RESULTS 

The central quantity we investigate is the probability of fix- 
ation of a mutation, conditional on its population frequency. 
A neutral mutation segregating at frequency v has a probabil- 
ity Pfix(v) = V to spread through the population and fix; in the 
rest of the cases, i.e. with probability 1 — v, it goes extinct. As 
illustrated in the inset of Fig. 1 A, this is a consequence of the 
fact that (i) exactly one individual in the current population 
will be the common ancestor of the entire future population 
at a particular site and (ii) this ancestor has a probability v 
of carrying the mutation (assuming the neutral mutation is not 
preferentially associated with genomes of high or low fitness). 
Deleterious or beneficial mutations fix less or more often than 
neutral ones, respectively. Fig. 1 shows the time course of 
the frequencies of all synonymous and nonsynonymous mu- 
tations observed in env, C2-V5, in patient plO (Shankarappa 
et al., 1999). Despite many synonymous mutations reaching 
high frequency, few fix (panel 1A); in constrast, many non- 
synonymous mutations fix (panel IB). Strictly speaking, no 
mutation in the HIV-1 population ever fixes because the mu- 
tation rate and the population size are large. Therefore, we 
define "fixation" or "loss" by not observing the mutation in 
the sample. 



A. Synonymous polymorphisms in env, C2-V5, are mostly 
deleterious 

We study the dynamics and fate of synonymous mutations 
more quantitatively by analyzing data from seven patients 
from Shankarappa et al. (1999) and (Liu et al, 2006) as well 
as three patients from Bunnik et al. (2008) (patients whose 
viral population was structured were excluded from the anal- 
ysis; see methods and Figure SI). The former data set is 
restricted to the C2-V5 region of env, while the data from 
Bunnik et al. (2008) cover the majority of env. Consider- 
ing all mutations in a frequency interval [vo — 8v,Vrj + 8v] at 
some time t, we calculate the fraction that are still observed at 
later times t + At. Plotting this fraction against the time inter- 
val At, we see that most synonymous mutations segregate for 
roughly one year and are lost much more frequently than ex- 
pected (panel 2A). The long-time probability of fixation, Pfi x , 
is shown as a function of the initial frequency Vo in panel 2B 
(red line). We find that Pfi x of synonymous mutations is far 
below the neutral expectation in C2-V5. Outside of C2-V5, 
using data from Bunnik et al. (2008) only, we find no such re- 
duction in Pfj x . Restricted to the C2-V5 region, the sequence 
samples from Bunnik et al. (2008) are fully compatible with 
data from Shankarappa et al. (1999). The nonsynonymous 
mutations seem to follow more or less the neutral expectation 
(blue line) - a point to which we will come back below. 

When interpreting these results for the fixation probabili- 
ties, it is important to distinguish between random mutations 
and polymorphisms observed at a certain frequency since the 
latter have already been filtered by selection. A polymorphism 
could be beneficial to the virus and on its way to fixation. In 
this case, we expect that it fixes almost surely given that we 
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Figure 1 Time series of frequencies of synonymous (A) and 
nonsynonymous (B) derived alleles in env, C2-V5, from patient 
10 (Shankarappa et al, 1999). While many nonsynonymous mu- 
tations fix, few synonymous mutations do even though they are fre- 
quently observed at intermediate frequencies. Colors indicate the 
position of the site along the C2-V5 region (blue to red). Inset: the 
fixation probability Pfj x of a neutral mutation is simply the likelihood 
that the future common ancestor at this position is currently carrying 
it, i.e. the mutation frequency v. 



see it at high frequency. If, on the other hand, the polymor- 
phism is deleterious it must have reached a high frequency by 
chance (genetic drift or hitchhiking), and we expect that se- 
lection drives it out of the population again. Hence our obser- 
vations suggest that many of the synonymous polymorphisms 
at intermediate frequencies in the part of env that includes C2- 
V5 are deleterious, while outside this region most polymor- 
phisms are roughly neutral. Note that this does not imply that 
all synonymous mutations outside C2-V5 are neutral - only 
those mutations observed at high frequencies, which have ex- 
perienced selection for some time, tend to be neutral. 



B. Synonymous mutations in C2-V5 tend to disrupt conserved 
RNA stems 

One possible explanation for lack of fixation of synony- 
mous mutations in C2-V5 is secondary structures in the vi- 
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Figure 2 Fixation and loss of synonymous mutations. Panel A) 
shows how quickly synonymous mutations are purged from the pop- 
ulations. Specifically, the figure shows the fraction of mutations that 
are still observed after At days, conditional on being observed in one 
of the three frequency intervals (different colors). In each frequency 
interval, the fraction of synonymous mutations that ultimately sur- 
vive is the fixation probability Pfi x conditional on the initial fre- 
quency. The neutral expectation for P^ x = Vq is indicated by dashed 
horizontal lines. Panel B) shows the fixation probability of derived 
synonymous alleles as a function of Vn. Polymorphisms within C2- 
V5 fix less often than expected for neutral mutations (indicated by 
the diagonal line). This suppression is not observed in other parts of 
env or for nonsynonymous mutations. The horizontal error bars on 
the abscissa are bin sizes, the vertical ones the standard deviation af- 
ter 100 patient bootstraps of the data. Data from refs. (Bunnik et al, 
2008; Liu et al, 2006; Shankarappa et al, 1999). 



ral RNA, the disruption of which is deleterious to the virus 
(Forsdyke, 1995; Sanjuan and Borderia, 2011; Snoeck et al, 
2011). 

The propensity of nucleotides in the HIV-1 genome to 
form base pairs has been measured using the SHAPE assay, 
a biochemical reaction preferentially altering unpaired bases 
(the HIV-1 genome is a single stranded RNA) (Watts et al, 
2009). The SHAPE assay has shown that the variable re- 
gions VI -V5 tend to be unpaired, while the conserved re- 
gions between those variable regions form stems. We aligned 



the within-patient sequence samples to the reference NL4- 
3 strain used by Watts et al. (2009) and thereby assigned 
SHAPE reactivities to most positions in the alignment. We 
then calculated the distributions of SHAPE reactivities for 
synonymous polymorphisms that fixed or were subsequently 
lost (only polymorphisms with frequencies above 15%). As 
shown in Fig. 3A, the reactivities of fixed alleles (red his- 
togram) are systematically larger than those of alleles that are 
lost (blue) (Kolmogorov-Smirnov test on the cumulative dis- 
tribution, p Ri 0.002). In other words, alleles that are likely to 
break RNA helices are also more likely to revert and finally be 
lost from the population. The average over all mutations that 
are not observed (green) lies between those that fix and those 
that get lost. Note that this analysis will be sensitive only at 
positions where the base pairing pattern of NL4-3 agrees with 
that of each patient's initial consensus sequence (it is thus sta- 
tistically conservative). 

To test the hypothesis that mutations in C2-V5 are lost be- 
cause they break stems in the conserved stretches between the 
variable loops, we consider mutations in variable loops and 
conserved parts separately. The greatest depression in fixa- 
tion probability is observed in the conserved stems, while the 
variable loops show little deviation from the neutral signature, 
see Fig. 3B. This is consistent with important stem structures 
in conserved regions between loops. 

In addition to RNA secondary structure, we have consid- 
ered other possible explanations for a fitness cost of some 
synonymous mutations, in particular codon usage bias (CUB). 
HIV-1 is known to prefer A-rich codons over highly ex- 
pressed human codons (Jenkins and Holmes, 2003; Kuyl and 
Berkhout, 2012). We do not find, however, any evidence for 
a contribution of average CUB to the ultimate fate of synony- 
mous alleles; consistently, HIV-1 does not seem to adapt its 
codon usage to its human host cells at the macroevolutionary 
level (Kuyl and Berkhout, 2012). 



C. Deleterious mutations are brought to high frequency by 
hitchhiking 

While the observation that some fraction of synonymous 
mutations is deleterious is not unexpected, it seems odd that 
we observe them at high population frequency and that the 
fixation probability is reduced only in parts of the genome (in 
C2-V5 but not in the rest of env; compare the red triangle 
line versus the green square line in Fig. 2B). The region C2- 
V5 undergoes frequent adaptive changes to evade recognition 
by neutralizing antibodies (Richman et al., 2003; Williamson, 
2003). Due to the limited amount of recombination in HIV- 
1 (Batorsky et al, 2011; Neher and Leitner, 2010), deleteri- 
ous mutations that are linked to adaptive variants can reach 
high frequency. This process is known as hitchhiking (Smith 
and Haigh, 1974) or genetic draft (Gillespie, 2000; Neher and 
Shraiman, 2011). Hitchhiking is apparent in Fig. 1, which 
shows that many mutations change rapidly in frequency as a 
flock. 

The approximate magnitude of the deleterious effects can 
be estimated from Fig. 2A, which shows the distribution of 
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Figure 3 Permissible synonymous mutations tend to be unpaired. 
Panel A) shows the distribution of SHAPE reactivities among sites 
at which synonymous mutations fixed (red), sites at which mutations 
reached frequencies above 15% but were susequently lost (blue), and 
sites at which no mutations were observed (green) (all categories are 
restricted to the regions Vl-V5±100bp). Sites at which mutations 
fixed tend to have higher SHAPE reactivities, corresponding to less 
base pairing, than those at which mutations are lost. Sites at which no 
mutations are observed show an intermediate distribution of SHAPE 
values. Panel B) shows the fixation probability of synonymous mu- 
tations in C2-V5 separately for variable regions V3-V5 and the con- 
necting conserved regions C2-C4 that harbor RNA stems. As ex- 
pected, the fixation probability is lower inside the conserved regions. 
Data from Refs. (Bunnik et al, 2008; Liu et al, 2006; Shankarappa 
etal, 1999). 



times after which synonymous alleles at intermediate frequen- 
cies become fixed or lost. The typical time to loss is of the 
order of 500 days. If this loss is driven by the deleterious ef- 
fect of the mutation, this corresponds to deleterious effects s c i 
of the order of —0.002 per day. (This is only an average es- 
timate: every single mutation is expected to have a slightly 
different fitness effect.) 

To get a better idea of the range of parameters that are com- 
patible with the observations and our interpretation, we per- 
formed computer simulations of evolving viral populations 
assuming a mix of positive and purifying selection and rare re- 
combination. For this purpose, we use the simulation package 
FFPopSim, which includes a module dedicated to intrapatient 
HIV evolution (Zanini and Neher, 2012). For each simulation 
run, we specify the deleterious effect of synonymous muta- 
tions, the fraction of synonymous mutations that are delete- 
rious, the escape rate (selection coefficient) of adaptive non- 
synonymous mutations and the rate at which previously un- 
targeted epitopes become targeted (the latter determines the 
number of sites available for escape). Note that the escape 
rate is the sum of two factors: (i) the beneficial effect due to 
the ability to evade the immune system minus (ii) the fitness 
cost of the mutation in terms of structure, stability, etc. Net 
escape rates in chronic infections have been estimated to be 
on the order of e = 0.01 per day (Asquith et al., 2006; Neher 
and Leitner, 2010). 

Fig. 4A shows simulation results for the fixation probability 
and the synonymous diversity for different deleterious effects 
of synonymous mutations. We quantify synonymous diver- 
sity via Pinterm, the fraction of sites with an allele at frequency 
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Figure 4 Distribution of selection coefficients on synonymous sites. 
Panel A) The depression in Pg^ depends on the deleterious effect size 
of synonymous alleles. This parameter also reduces synonymous di- 
versity, measured by the probability of a derived allele to be found at 
intermediate frequencies Pi ntelm (first inset). Panel B) To assess the 
parameter space that affects synonymous fixation and diversity, we 
run 2400 simulations with random parameters for deleterious effect 
size, fraction of deleterious synonymous sites, average escape rate 
e (color, blue to red corresponds to 10~ 2,5 to 10~'' 5 per day), and 
rate of introduction of new epitopes (marker size, from 10~ 3 to 10~ 2 
per day). Only simulations that reproduce the synonymous diversity 
and fixation patterns observed in data are shown. These simulations 
demonstrate that deleterious effects are around —0.002 and a large 
fraction of the synonymous mutations needs to be deleterious. As ex- 
pected, larger sj require larger e. Parameters are chosen from prior 
distributions uniform in logspace as indicated by the red rectangle 
(see methods). 



0.25 < V < 0.75. The synonymous diversity observed in pa- 
tient data is indicated in the figure. To quantify the depres- 
sion of the fixation probability, we calculate the area between 
the measured fixation probability and the diagonal, which is 
the neutral expectation (Fig. 4A, lower inset). If no fixa- 
tion happens, the area will be —0.5; if every mutation fixes, 
the area will be +0.5. In HIV-1 infected patients, we find 
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—0.2 for synonymous changes and 
^nonsyn ~ for nonsynonymous changes. In the three sim- 
ulations shown in Fig. 4A, the fixation probability of synony- 
mous alleles decreases from the neutral expectation (A syn « 0) 
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to zero (A syn w —0.5) as their fitness cost increases; the syn- 
onymous diversity plummets as well, as deleterious mutations 
are selected against. 

To map the parameter range of the model that is compatible 
with the data, we repeatedly simulated the evolution with ran- 
dom choices for the parameters in certain bounds, see Fig. 4B. 
Among all simulations, we select the ones that show A syn 
and Pmtei-m as observed in the data, i.e., a large depression in 
fixation probability of synonymous mutations but, simultane- 
ously, a moderately high synonymous diversity. Specifically, 
Fig. 4B shows parameter combinations for which we found 
A svn < -0.15 and 0.0025 < Anterm < 0.010. These conditions 
indicate that a high fraction (> 0.8) of sites has to be dele- 
terious with effect size \sj\ ~ 0.002. This result fits well the 
expectation based on the fixation/extinction times above (see 
Fig. 2A). The results are plausible: (i) a substantial depression 
in Pfi x requires pervasive deleterious mutations, otherwise the 
majority of observed polymorphisms are neutral and no de- 
pression is observed; (ii) in order to hitchhike, the deleterious 
effect size has to be much smaller than the escape rate, oth- 
erwise the double mutant has little or no fitness advantage. 
Consistent with this argument, larger deleterious effects in 
Fig. 4B correspond to larger escape rates; and (iii) mutations 
with a deleterious effect smaller than approximately 0.001 be- 
have neutrally, consistent with the typical coalescent times ob- 
served in HIV- 1 . 

The above simulations show that hitchhiking can explain 
the observation of deleterious mutations that rarely fix. How- 
ever, in a simple model where nonsynonymous escape mu- 
tations are unconditionally beneficial, they almost always fix 
once they reach high frequencies - A nonsyn is well above zero. 
This is incompatible with the blue line in Fig. 2B: in an HIV- 
1 infection, nonsynonymous mutations at high frequency of- 
ten disappear again, even though many are at least transiently 
beneficial. Inspecting the trajectories of nonsynonymous mu- 
tations suggests the rapid rise and fall of many alleles. We test 
two possible mechanisms that are biologically plausible and 
could explain the transient rise of nonsynonymous mutations: 
time-dependent selection and within-epitope competition. 

The former hypothesis can be formulated as follows: if the 
immune system recognizes the escape mutant before its fix- 
ation, the mutant might cease to be beneficial and disappear 
soon, despite its quick initial rise in frequency. In support 
of this idea, Bunnik et al. (2008); Richman et al. (2003) re- 
port antibody responses to escape mutants. These responses 
are delayed by a few months, roughly matching the average 
time needed by an escape mutant to rise from low to high 
frequency. To model this type of behavior, we assume that 
antibody responses against escape mutations arise with a rate 
proportional to the frequency of the escape mutation and abol- 
ish the benefit of the escape mutations. As expected, this type 
of time-dependent selection retains the potential for hitchhik- 
ing, but reduces fixation of nonsynonymous mutations. Fig- 
ure S3 shows thatPfl x of synonymous mutations is not affected 
by this change, while Pfl x of nonsynonymous mutations ap- 
proaches the diagonal as the rate of recognition of escape mu- 
tants is increased. 

In the alternative hypothesis, several different escape mu- 



tations within the same epitope might arise almost simulta- 
neously and start to spread. Their benefits are not additive, 
because each of them is essentially sufficient to escape. As a 
consequence, several escape mutations rise to high frequency 
rapidly, while the one with the smallest cost in terms of repli- 
cation, packaging, etc. is most likely to eventually fix. The 
emergence of multiple sweeping nonsynonymous mutations 
in real HIV-1 infections has been shown (Bar et al, 2012; 
Moore et al, 2009). This scenario has been explicitly ob- 
served in the evolution of resistance to 3TC, where the mu- 
tation Ml 84V is often preceeded by Ml 841 (Hedskog et al, 
2010). Similarly, AZT resistance often emerges via the com- 
peting TAM and TAM1 pathways. Within epitope competi- 
tion can be implemented in the model through epistasis be- 
tween escape mutations. While each mutation is individually 
beneficial, combining the mutations is deleterious (no extra 
benefit, but additional costs). Again, we find that the potential 
for hitchhiking is little affected by within epitope competition 
but that the fixation probability of nonsynonymous polymor- 
phisms is reduced. With roughly six mutations per epitope, 
the simulation data are compatible with observations; see Fig- 
ure S4. The two scenarios are not exclusive and possibly both 
important in HIV- 1 evolution. 



III. DISCUSSION 

By analyzing the fate of mutations in longitudinal data of 
HIV-1 env evolution, we demonstrate selection against syn- 
onymous substitutions in the comparatively conserved regions 
C2-C4 of the env gene. Comparison with biochemical studies 
of base pairing propensity in RNA genome of HIV-1 indicates 
that these mutations are deleterious, at least in part, because 
they disrupt stems in RNA secondary structures. Computa- 
tional modeling shows that these mutations have deleterious 
effects on the order of 0.002 and that they are brought to high 
frequency through linkage to adaptive mutations. 

The fixation and extinction times and probabilities repre- 
sent a rich and simple summary statistics useful to charac- 
terize longitudinal sequence data and compare to models via 
computer simulations. A method that is similar to ours in 
spiritu has been recently used in a longitudinal study of in- 
fluenza evolution (Strelkowa and Laessig, 2012). The central 
quantity used in that article, however, is a ratio between prop- 
agators of nonsynonymous and synonymous mutations. The 
latter is used as an approximately neutral control; this method 
can therefore not be used to investigate synonymous changes 
themselves. More generally, evolutionary rates at synony- 
mous sites are often used as a baseline to detect purifying or 
diversifying selection at the protein level (Hurst, 2002). It has 
been pointed out, however, that the rate of evolution at syn- 
onymous sites varies considerable along the HIV-1 genome 
(Mayrose et al, 2007) and that this variation can confound es- 
timates of selection on proteins substantially (Ngandu et al, 
2008). 

A functional significance of the insulating RNA structure 
stems between the hypervariable loops has also been proposed 
previously (Sanjuan and Borderia, 2011; Watts et al, 2009) 
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and conserved RNA structures exist in different parts of the 
HIV-1 genome. Since there are of course many ways to build 
an RNA stem in a particular location, we do not necessarily 
expect a strong signal of conservation in cross-sectional data. 
Our analysis, however, is able to quantify the fitness effect 
of RNA structure within single infections and demonstrates 
how selection at synonymous sites can alter genetic diversity 
and dynamics. The observed hitchhiking highlights the im- 
portance of linkage due to infrequent recombination for the 
evolution of HIV-1 (Batorsky et al, 2011; Josefsson et al., 
2011; Neher and Leitner, 2010). The recombination rate has 
been estimated to be on the order of p = 10~ 5 per base and 
day. It takes roughly t sw = £ logvo generations for an es- 
cape mutation with escape rate e to rise from an initially low 
frequency Vo ~ /J to frequency one. This implies that a region 
of length / = (pf m .) ~ 1 = e/p log Vo remains linked to the adap- 
tive mutation. With e = 0.01, we have / ss 100 bases. Hence 
we expect strong linkage between the variable loops and the 
flanking sequences, but none far beyond the variable regions, 
consistent with the lack of signal outside of C2-V5. In case of 
much stronger selection - such as observed during early CTL 
escape or drug resistance evolution - the linked region is of 
course much larger (Nijhuis et al, 1998). 

While classical population genetics assumes that the domi- 
nant stochastic force is genetic drift, i.e. non-heritable fluctu- 
ations in offspring number, our results show that stochasticity 
due to linked selection is much more important. Such fluc- 
tuations have been termed genetic draft by Gillespie (2000). 
Genetic draft in facultatively sexual population such as HIV-1 
has been characterized in (Neher and Shraiman, 2011). Im- 
portantly, large population sizes are compatible with low di- 
versity and fast coalescence when draft dominates over drift. 

Contrary to naive expectations, the adaptive escape muta- 
tions do not seem to be unconditionally beneficial. Otherwise 
we would observe almost sure fixation of a nonsynonymous 
mutation once they reach intermediate frequencies. Instead, 
we find that the fixation probability of nonsynonymous mu- 
tations is roughly given by its frequency. There are several 
possible explanations for this observation. Similar to synony- 
mous mutations, the majority of nonsynonymous mutations 
could be weakly deleterious, and the adaptive and deleterious 
parts could conspire to yield a more neutral-like averaged fix- 
ation probability. While weakly deleterious nonsynonymous 
mutations certainly exist and will contribute to a depression 
of the fixation probability, we have seen that a substantial 
depression requires that weakly deleterious nonsynonymous 
polymorphisms at high frequency greatly outnumber escape 
mutations. This seems unlikely, since nonsynonymous diver- 
sity exceeds synonymous diversity despite the overall much 
greater constraints on the amino acid sequence. 

Alternatively, the lack of fixation could be due to time- 
dependent environment through an immune system that is 
catching up, or competition between mutations that mediate 
escape within the same epitope. We explore both of these 
possibilities and find that both produce the desired effect in 
computer models. Furthermore, there is experimental evi- 
dence in support of both of these hypotheses. Serum from 
HIV-1 infected individuals typically neutralizes the virus that 



dominated the population a few (3-6) months earlier (Rich- 
man et al., 2003). This suggests that escape mutations cease 
to be beneficial after a few months and might revert if they 
come with a fitness cost. Deep sequencing of regions of env 
after antibody escape have revealed multiple escape mutations 
in the same epitope (Bar et al., 2012; Moore et al, 2009). Pre- 
sumably, each one of these mutations is sufficient for escape 
but most combinations of them do not provide any additional 
benefit to the virus. Hence only one mutation will spread and 
the others will be driven out of the population although they 
transiently reach high frequencies. The rapid emergence of 
multiple escape mutations in the same epitope implies a large 
effective population size that explores all necessary point mu- 
tations rapidly. A similar point has been made recently by 
Boltz et al. in the context of preexisting drug resistance muta- 
tions (Boltz et al, 2012). 

Our results emphasize the inadequacy of independent site 
models of HIV-1 evolution and the common assumption that 
selection is time independent or additive. If genetic variation 
is only transiently beneficial, existing estimates of the strength 
of selection (Batorsky et al, 2011; Neher and Leitner, 2010) 
could be substantial underestimates. Furthermore, weak con- 
servation and time-dependent selection result in estimates of 
evolutionary rates that depend on the time interval of observa- 
tion, with lower rates across larger intervals. This implies that 
deep nodes in phylogenies might be older than they appear. 

IV. METHODS 

A. Sequence data collection 

Longitudinal intrapatient viral RNA sequences were col- 
lected from published studies (Bunnik et al, 2008; Liu et al, 
2006; Shankarappa et al, 1999) and downloaded from the 
Los Alamos National Laboratory (LANL) HIV sequence 
database (Kuiken et al, 2012). The samples from some pa- 
tients show substantial population structure and were dis- 
carded (see Figure SI); a total of 11 patients with 4-23 time 
points each and approximately 10 sequences per time point 
were analyzed. The time intervals between two consecutive 
sequences ranged from 1 to 34 months, most of them between 
6 and 10 months. 



B. Sequence analysis 

The sequences were translated and the resulting amino acid 
sequences aligned using Muscle (Edgar, 2004) to each other 
and the NL4-3 reference sequences separately for each pa- 
tient. Within each patient, the consensus nucleotide sequence 
at the first time point was used to classify alleles as "ancestral" 
or "derived" at all sites. Sites that include large frequencies of 
gaps were excluded from the analysis to avoid artifactual sub- 
stitutions due to alignment errors. Allele frequencies at dif- 
ferent time points were extracted from the multiple sequence 
alignment. 

A mutation was considered synonymous if it did not change 
the amino acid corresponding to the codon, and if the rest of 
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the codon was in the ancestral state. Codons with more than 
one mutation were discarded. Slightly different criteria for 
synonymous/nonsynonymous discrimination yielded similar 
results. 



C. Fixation probability and secondary structure 

For the estimates of time to fixation/extinction, polymor- 
phisms were binned by frequency and the time to first reach- 
ing either fixation or extinction was stored. The fixation prob- 
ability was determined as the long-time limit of the result- 
ing curves. Mutations that reached high frequency but neither 
fixed nor were lost were classified as "floating", with one ex- 
ception: if they first reached high frequencies within 3 years 
of the last time point, it was assumed they had not had suffi- 
cient time to settle, so they were discarded. 

The SHAPE scores quantifying the degree of base pairing 
of individuals sites in the HIV-1 genome were downloaded 
from the journal website (Watts et al, 2009). Wherever pos- 
sible, SHAPE reactivities were assigned to sites in the mul- 
tiple sequence alignments for each patient through the align- 
ment to the sequence of the NL4.3 virus used in ref. (Watts 
et al., 2009). Problematic assignments in indel-rich regions 
were excluded from the analysis. The variable loops and 
flanking regions were identified manually starting from the 
annotated reference HXB2 sequence from the LANL HIV 
database (Kuiken et al., 2012). 



D. Computer simulations 

Computer simulations were performed using FFPopSim 
(Zanini and Neher, 2012). Briefly, FFPopSim enables 
individual-based simulations where each site in the genome 
is represented by one bit that can be in one of two states. Out- 
crossing rates, crossover rates, mutations rates and arbitrary 
fitness functions can be specified. We used a generation time 
of 1 day, an outcrossing rate of r = 0.01 per day (Batorsky 
et al., 2011; Neher and Leitner, 2010), a mutation rate of 
fj = 10~ 5 (Abram et al., 2010; Mansky and Temin, 1995) and 
simulated intrapatient evolution for 6000 days. For simplic- 
ity, third positions of every codon were deemed synonymous 
and assigned either a selection coefficient with probability 
1 — a or a deleterious effect s c i with probability a. Mutations 
at the first and second positions were assigned strongly dele- 
terious fitness effects 0.02. At rate k^, a random locus in the 
genome is designated an epitope that can escape by one or sev- 
eral mutations with an exponentially distributed escape rate 
with mean e. Both full-length HIV-1 genomes and env-only 
simulations were performed and yielded comparable results. 

The simulations were repeated 2400 times with random 
choices for the following parameters: the fraction of deleteri- 
ous sites a was sampled uniformly between 0.75 and 1.0; the 
average deleterious effect s ( i was sampled such that its loga- 
rithm was uniformly distributed between 10~ 4 and 10~ 2 ; the 
average escape rate 8 of escape mutation was sampled such 
that its logarithm was uniform between 10~ 2,5 and 10~ 15 and 



the rate Ica of new antibody challenges such that its logarithm 
was uniform between 10~ 3 and 10~ 2 per generation. Popula- 
tions were initialized with a homogenous founder population 
and were kept at an average size of N = 10 4 throughout the 
simulation. After 30 generations of burn-in to create genetic 
diversity, new epitopes were introduced at a constant rate Ica- 
For the models with competition within epitopes, a com- 
plex epistatic fitness landscape was designed such that each 
single mutant is sufficient for full escape. In particular, each 
mutation had a linear effect equal to the escape, but a nega- 
tive epistatic effect of the same magnitude between each pair 
of sites was included. Higher order terms compensated each 
other to make sure that not only double mutants, but all k- 
mutants with k > 1 had the same fitness (see supplementary 
materials). To model recognition of escape variants by the 
immune system catching up, the beneficial effect of an escape 
mutation was set to its previous cost of -0.02 with a probabil- 
ity per generation proportional to the frequency of the escape 
variant. 

For each set of parameters, fixation probabilities and prob- 
abilities of synonymous polymorphisms P m tenn were calcu- 
lated as averages over 100 repetitions (with different random 
seeds). 

The areas below or above the neutral fixation probability 
(diagonal line) were estimated from the binned fixation proba- 
bilities using linear interpolation between the bin centers. This 
measure is sufficiently precise for our purposes. In 10 runs out 
of 2400, the highest frequency bin was empty so the fixation 
probability could not be calculated; those runs were excluded 
from Fig. 4B. 

E. Methods availability 

All analysis and computer simulation scripts, as well as 
the sequence alignments used, are available for download at 

http : //git .tuebingen.mpg.de/synmut. 
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Appendix A: Selection of the patient data 




Figure SI Structure of viral populations and patient selection. Panel A) shows a PC A of all sequences from patient pi (colors indicate time 
from seroconversion, from blue to red). Panel B) shows allele frequency trajectories for nonsynonymous changes in the same patient. Here, 
the blue to red color map corresponds to the position of the allele in env from 5' to 3'. Panels C) and D) show analogous plots for data from 
patient p7. Samples after day 1000 split into two clusters in the PCA and no mutations that arise after day 1000 fix, presumably because they 
are restricted to one subpopulation. All patients like p7 (p4, p7, p8, p9 from ref. Shankarappa et al., 1999 and ACH19542 and ACH19768 
from ref. Bunnik et al, 2008) were excluded from our analysis. 
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Appendix B: Synonymous diversity across the HIV genome 




Position in gag Position in pol 




Position in env Position in nef 



Figure S2 Synonymous diversity across the HIV genome, as quantified by the normalized codon entropy among sequences coding for the 
consensus amino acid. In most parts of the genome, synonymous sites show little conservation. The synonymous diversity peaks at the 
variable regions in env and is reduced in regions under purifying selection (RRE hairpin, second tat/rev exons). The normalized codon entropy 
is calculated as follows (see the script codon_entropy_synonymous_subtypeB.py for the full algorithm): (i) from a subtype B multiple 
sequence alignment (MSA) from the LANL website (filtered sequences only, version 201 1) (Kuiken et al, 2012), we calculate the consensus 
amino acid at each position in the HIV genome; (ii) we count how often each codon coding for the consensus amino acid appears in the 
MSA; (iii) at each amino acid position, we divide by the number of sequences in the MSA that had the consensus amino acid at that position, 
obtaining codon frequencies v c ; (iv) we calculate the codon entropy from each position as : 5 : = — £ c V c log v c , where c runs over codons that 
code for the consensus amino acid at this site; (v) we divide by the maximal codon entropy of that amino acid (e.g. log 2 for twofold degenerate 
codons). All parts of env that are part of a different gene (signaling peptide, second rev exon) have been excluded from our main analysis, to 
avoid contamination by protein selection in a different reading frame. Note that all gap-rich columns of the MSA are stripped from this figure, 
so genes such as env might appear shorter than they actually are. 
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Appendix C: Time-dependent selection 



Fixation probabilities with time-dependent selection 




OO 02 04 06 08 LO 

V 



Figure S3 Time-dependent selection reduces fixation of nonsynonymous mutations. The figure compares the fixation probability in the time 
independent model (naive) to a model with time dependent selection that mimics an evolving immune system. It has been found that virus 
is typically neutralized by serum from a few months earlier (Richman et al, 2003) but not by contemporary serum. We model this evolving 
immune system by assuming that escaped variants lose their beneficial effect with a rate proportional to the frequency of the escaped variant. 
Specifically, the selection effect of the escape mutations is reset to its fitness cost of —0.02 with probability 

^recognized (0 C*V(?J, 

per generation, where c is a constant coefficient shown in the legend that encodes the overall efficiency of the host immune system. With 
increasing probability of recognition, the fixation of frequent escape mutants is reduced, while hitch-hiking of synonymous mutations is not 
affected. The precise shape of Pf, x (v) depends on the details of the -P reC ognized(0> an d we do not think that the high / , nx (v) for v < 0.2 is 
meaningful. The other parameters for the shown simulations are the following: deleterious effect = 10~ 3 , average escape rate e = 0.016, 
fraction of deleterious synonymous mutations a = 0.986, rate of new epitopes Ica = 0.0014 per generation. 
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Appendix D: Within-epitope competition 



Fixation probabilities with complex epitopes 
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Figure S4 Competition between escape mutations in the same epitope reduces fixation of nonsynonymous mutations. The figure compares the 
fixation probability of models with one, three, or six mutually exclusive escape mutations within the same epitope. Within epitope competition 
results in reduced fixation probabilities of nonsynonymous changes, whereas the synonymous changes behave similarly in all cases. We 
assume that escape can happen at n sites out of 3 consecutive codons and vary n. The fitness landscape of each epitope includes negative 
epistatic terms, so that the joint presence of more than one escape mutation is not any more beneficial for the virus than a single mutation. 
Specifically, each site has two alleles, ±1, where —1 is the ancestral one and +1 the derived one; the fitness coefficient of a A>tuple of sites 
within the epitope is /$ = (— 1)* _1 2 1_ "T| E , where r| E is the escape rate of the epitope drawn from an exponential distribution with mean e 
and n is the number of competing escapes in the epitope. In this evolutionary scenario, many escape mutations start to sweep on different 
backgrounds within the viral population, but eventually compete and only one of them fixes. The other parameters for the shown simulations 
are the following: deleterious effect s c [ = 10~ 3 , average escape rate e = 0.016, fraction of deleterious synonymous mutations a = 0.986, rate 
of new epitopes k& = 0.0014 per generation. 



